Best Model Pruning AI Tools & Models - Premium Model Pruning News

AI News

Pruna AI Releases Open-Source AI Model Optimization Framework for Efficient Compression

Pruna AI, a European startup focused on AI model compression algorithms, recently announced the open-sourcing of its optimization framework to help developers more efficiently compress AI models. The framework developed by Pruna AI combines several efficiency methods, including caching, pruning, quantization, and distillation, aiming to improve AI model performance. It not only standardizes the saving and loading of compressed models but also evaluates the compressed models to determine whether their quality has significantly degraded, while simultaneously measuring performance.

6.2k 2 days ago

Pruna AI Releases Open-Source AI Model Optimization Framework for Efficient Compression

Llama3 Compressed Version! Nvidia Releases Small Language Model Llama-3.1-Minitron4B with Only 400 Million Parameters

Nvidia's research team has successfully launched Llama-3.1-Minitron4B using model pruning and distillation techniques. This is a compressed version of the Llama3 model, aimed at implementing artificial intelligence on devices. The model reduces the parameter count of the original 8B model through deep and width pruning techniques while maintaining performance close to larger models. Despite a significant reduction in training data (by 40 times), the model achieved a 16% performance improvement on the MMLU benchmark.

17.6k 9 hours ago

Llama3 Compressed Version! Nvidia Releases Small Language Model Llama-3.1-Minitron4B with Only 400 Million Parameters

NVIDIA's Breakthrough in Open Source: New Model Training Saves 1.8x Computing Power!

NVIDIA has open-sourced two new large models, Nemotron-4-Minitron-4B and Nemotron-4-Minitron-8B, which utilize structured pruning and knowledge distillation for efficient training. This significantly reduces training requirements, minimizing data and computing power consumption. Compared to traditional methods, the new models reduce training token data by 40 times and save 1.8 times in computing costs. By optimizing Llama-3.18B, structured pruning simplifies the model structure, while knowledge distillation enhances performance.

13.4k 12-16

NVIDIA Launches Minitron Small Language Models: 40x Training Speed Improvement

Recently, NVIDIA launched the Minitron series of small language models, including 4B and 8B versions, significantly increasing training speed by 40 times while greatly reducing resource and data requirements, resulting in cost savings. By combining techniques of 'pruning' and 'knowledge distillation', the Minitron models maintain performance while reducing size, allowing developers to harness advanced technology for applications such as translation, sentiment analysis, and dialog AI at a lower cost. The open-source nature of the Minitron models enables more people to easily access and use them.

13k 12-15

AI Products

HandRefiner

The fp16 version of the HandRefiner model after pruning and compression

AI image restoration

22k

Models

Gemini 2.0 Flash-Lite

Google

$0.49

Input tokens/M

$2.1

Output tokens/M

Context Length

GPT-4.1 mini

Openai

$2.8

Input tokens/M

$11.2

Output tokens/M

Context Length

Grok 4 Fast

Xai

$1.4

Input tokens/M

$3.5

Output tokens/M

Context Length

o3-mini

Openai

$7.7

Input tokens/M

$30.8

Output tokens/M

200

Context Length

GPT-5 Codex

Openai

Input tokens/M

Output tokens/M

Context Length

Claude 3 Opus

Anthropic

$105

Input tokens/M

$525

Output tokens/M

200

Context Length

Gemini 2.0 Flash

Google

$0.7

Input tokens/M

$2.8

Output tokens/M

Context Length

Claude Haiku 4.5

Anthropic

Input tokens/M

$35

Output tokens/M

200

Context Length

Gemini 2.5 Flash

Google

$2.1

Input tokens/M

$17.5

Output tokens/M

Context Length

Claude Sonnet 4.5

Anthropic

$21

Input tokens/M

$105

Output tokens/M

200

Context Length

Claude 3 Sonnet

Anthropic

$21

Input tokens/M

$105

Output tokens/M

200

Context Length

Gemini 2.5 Flash-Lite

Google

$0.7

Input tokens/M

$2.8

Output tokens/M

Context Length

qwen3-vl-235b-a22b-thinking

Alibaba

Input tokens/M

$20

Output tokens/M

Context Length

qwen3-coder-plus

Alibaba

Input tokens/M

$16

Output tokens/M

Context Length

Qianfan-Lightning

Baidu

Input tokens/M

Output tokens/M

128

Context Length

wan2.5-i2i-preview

Alibaba

Input tokens/M

Output tokens/M

Context Length

qwen3-max

Alibaba

Input tokens/M

$24

Output tokens/M

256

Context Length

qwen3-vl-plus

Alibaba

Input tokens/M

$10

Output tokens/M

256

Context Length

qwen-image-plus

Alibaba

Input tokens/M

Output tokens/M

Context Length

qwen-image-edit

Alibaba

Input tokens/M

Output tokens/M

Context Length

Empowering the future, your artificial intelligence solution think tank

English 简体中文繁體中文にほんご

FirendLinks:

AI Newsletters AI Tools MCP Servers AI News AIBase LLM Leaderboard AI Ranking

Business Cooperation Site Map

AI News

Pruna AI Releases Open-Source AI Model Optimization Framework for Efficient Compression

Llama3 Compressed Version! Nvidia Releases Small Language Model Llama-3.1-Minitron4B with Only 400 Million Parameters

NVIDIA's Breakthrough in Open Source: New Model Training Saves 1.8x Computing Power!

NVIDIA Launches Minitron Small Language Models: 40x Training Speed Improvement

AI Products

HandRefiner

Models

Gemini 2.0 Flash-Lite

GPT-4.1 mini

Grok 4 Fast

o3-mini

GPT-5 Codex

Claude 3 Opus

Gemini 2.0 Flash

Claude Haiku 4.5

Gemini 2.5 Flash

Claude Sonnet 4.5

Claude 3 Sonnet

Gemini 2.5 Flash-Lite

qwen3-vl-235b-a22b-thinking

qwen3-coder-plus

Qianfan-Lightning

wan2.5-i2i-preview

qwen3-max

qwen3-vl-plus

qwen-image-plus

qwen-image-edit

Cerebras_MiniMax M2 REAP 172B A10B GGUF

Cerebras_MiniMax M2 REAP 139B A10B GGUF

MiniMax M2 REAP 172B A10B MXFP4_MOE GGUF

MiniMax M2 REAP 162B A10B

MiniMax M2 REAP 172B A10B

MiniMax M2 THRIFT MXFP4_MOE GGUF

Qwen3 Coder REAP 363B A35B GGUF

Qwen Image Edit 2509 13B 4steps

GLM 4.6 REAP 266B A32B Q4_K

GLM 4.6 REAP 266B A32B Q2_K

GLM 4.5 Air REAP 82B A12B Mlx 4Bit

Zirel 3

GLM 4.5 Air REAP 82B A12B Qx64g Hi Mlx

Gpt Oss 6.0b Specialized All Pruned Moe Only 7 Experts I1 GGUF

Gpt Oss 4.2b Specialized All Pruned Moe Only 4 Experts Q4_K_M GGUF

AmanPriyanshu.gpt Oss 6.0b Specialized All Pruned Moe Only 7 Experts GGUF

Gpt Oss 6.0b Specialized All Pruned Moe Only 7 Experts Q4_K_M GGUF

Gpt Oss 6.0b Specialized All Pruned Moe Only 7 Experts Q8_0 GGUF

Slim Orpheus 3b JAPANESE Ft

Granite Guardian 3.2 5b